Remove separate syntax heads for each operator #575

Keno · 2025-07-11T05:31:58Z

This replaces all the specialized operator heads by a single K"Operator" head that encodes the precedence level in its flags (except for operators that are also used for non-operator purposes). The operators are already K"Identifier" in the final parse tree. There is very little reason to spend all of the extra effort separating them into separate heads only to undo this later. Moreover, I think it's actively misleading, because it makes people think that they can query things about an operator by looking at the head, which doesn't work for suffixed operators.

Additionally, this removes the op= token, replacing it by two tokens, one K"Operator" with a special precendence level and one =. This then removes the last use of bump_split (since this PR is on top of #573).

As a free bonus this prepares us for having compound assignment syntax for suffixed operators, which was infeasible in the flips parser. That syntax change is not part of this PR but would be trivial (this PR makes it an explicit error).

Fixes #334

Unfortunately, the sequences `..` and `...` do not always refer to the `..` operator or the `...` syntax. There are two and a half cases where they don't: 1. After `@` in macrocall, where they are both regular identifiers 2. In `import ...A` where the dots specify the level 3. `:(...)` treats `...` as quoted identifier Case 1 was handled in a previous commit by lexing these as identifiers after `2`. However, as a result of case 2, it is problematic to tokenize these dots together; we essentially have to untokenize them in the import parser. It is also infeasible to change the lexer to have speical context-sensitive lexing in `import`, because there could be arbitrary interpolations, `@eval import A, $(f(x..y)), ..b`, so deciding whether a particular `..` after import refers to the operator or a level specifier requires the parser. Currently the parser handles this by splitting the obtained tokens again in the import parser, but this is undesirable, because it invalidates the invariant that the tokens produced by the lexer correspond to the non-terminals of the final parse tree. This PR attempts to address this by only ever having the lexer emit `K"."` and having the parser decide which case it refers to. The new non-terminal `K"dots"` handles the identifier cases (ordinary `..` and quoted `:(...)` ). K"..." is now exclusively used for splat/slurp, and is no longer used in its non-terminal form for case 3.

codecov · 2025-07-11T05:51:52Z

Codecov Report

Attention: Patch coverage is 98.11321% with 4 lines in your changes missing coverage. Please review.

Please upload report for BASE (kf/dots@daf52ca). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
src/julia/kinds.jl	85.00%	3 Missing ⚠️
src/julia/tokenize.jl	99.12%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             kf/dots     #575   +/-   ##
==========================================
  Coverage           ?   95.41%           
==========================================
  Files              ?       16           
  Lines              ?     4578           
  Branches           ?        0           
==========================================
  Hits               ?     4368           
  Misses             ?      210           
  Partials           ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

This replaces all the specialized operator heads by a single K"Operator" head that encodes the precedence level in its flags (except for operators that are also used for non-operator purposes). The operators are already K"Identifier" in the final parse tree. There is very little reason to spend all of the extra effort separating them into separate heads only to undo this later. Moreover, I think it's actively misleading, because it makes people think that they can query things about an operator by looking at the head, which doesn't work for suffixed operators. Additionally, this removes the `op=` token, replacing it by two tokens, one K"Operator" with a special precendence level and one `=`. This then removes the last use of `bump_split` (since this PR is on top of #573). As a free bonus this prepares us for having compound assignment syntax for suffixed operators, which was infeasible in the flips parser. That syntax change is not part of this PR but would be trivial (this PR makes it an explicit error). Fixes #334

c42f

Ok, so I like this a lot in overview and I think the idea is right.

But there's a fair bit to clean up in the implementation and I'm going to be honest, this took an absolute ton of time to review.

One pervasive issue I find confusing in the parser.jl code changes is that isassign output of peek_dotted_op_token() is often ignored, but not always. Which cases is this actually ok for? One practical difference between ignoring isassign vs not is the difference between the following errors:

isassign checked for:

julia> parsestmt(SyntaxNode, "x |>= y")
ERROR: ParseError:
# Error @ line 1:3
x |>= y
# └┘ ── Compound assignment is not allowed for this operator

vs isassign not checked for:

julia> parsestmt(SyntaxNode, "x ..= y")
ERROR: ParseError:
# Error @ line 1:5
x ..= y
#   ╙ ── unexpected `=`

Was Claude or other AI tool used for the code changes?

I feel we may need a bunch more tests to check that isassign is used correctly.

c42f · 2025-08-08T03:56:05Z

test/tokenize.jl

+    @test tok("1+=2",  2).kind == K"Operator" # + before =
+    @test tok("1+=2",  3).kind == K"="


For testing multiple tokens with the same input, I suggest toks():

Suggested change

@test tok("1+=2", 2).kind == K"Operator" # + before =

@test tok("1+=2", 3).kind == K"="

@test toks("1+=2")[2:3] == ["+"=>K"Operator", "="=>K"="]

(a lot of tests for tokenize.jl were written over time with various test tooling and haven't necessarily been updated to the latest way to do these things)

c42f · 2025-08-08T04:11:40Z

src/julia/kinds.jl

    # not clear whether we need to handle them. (Though note `.->` is a
    # token...)
-    return k in KSet"&& || . ... ->" || is_syntactic_assignment(k)
+    return k in KSet"&& || . ... -> = :="


With this change we now have

julia> JuliaSyntax.is_syntactic_operator(K".=") false

Whereas it used to be true. Was this intentional?

c42f · 2025-08-08T04:20:34Z

src/julia/parser.jl

-function is_plain_equals(t)
-    kind(t) == K"=" && !is_suffixed(t)
-end
+is_plain_equals(t) = kind(t) == K"="


Let's remove this function, it's only used in two places and the test is now trivial.

c42f · 2025-08-08T04:31:13Z

src/julia/tokenize.jl

+"""
+    emit(l::Lexer, kind::Kind)
+
+Returns a `RawToken` of kind `kind` with contents `str` and starts a new `RawToken`.
+"""


Suggested change

"""

emit(l::Lexer, kind::Kind)

Returns a `RawToken` of kind `kind` with contents `str` and starts a new `RawToken`.

"""

wrong docstring

c42f · 2025-08-08T04:39:34Z

src/julia/parser.jl

    else
        # f() = 1  ==>  (function-= (call f) 1)
        # f() .= 1 ==>  (.= (call f) 1)
        # a += b   ==>  (+= a b)


Comment needs fixing I guess (or delete it because it's covered in the if-else below)

c42f · 2025-08-08T08:17:41Z

src/julia/parser.jl

+            emit(ps, mark, leading_dot ? K".op=" : K"op=")
+            if check_identifiers
+                # +=   ==>  (error (op= +))
+                # .+=  ==>  (error (. (op= +)))


This is not correct

Suggested change

# .+= ==> (error (. (op= +)))

# .+= ==> (error (.op= +))

c42f · 2025-08-08T08:27:02Z

test/parser.jl

        "(f(x)::T) where S = 1" =>  "(function-= (where (parens (::-i (call f x) T)) S) 1)"
        "f(x) = 1 = 2"    =>  "(function-= (call f x) (= 1 2))" # Should be a warning!
+        # Bad assignment with suffixed op
+        ((v = v"1.12",), "a +₁= b") =>  "(op= a (error +₁) b)"


This is not a version-specific error as implemented

Suggested change

((v = v"1.12",), "a +₁= b") => "(op= a (error +₁) b)"

"a +₁= b" => "(op= a (error +₁) b)"

c42f · 2025-08-08T08:29:25Z

test/parser_api.jl

+    tokens = tokenize("+₁")
+    @test length(tokens) == 1  # Just the identifier, endmarker is not included in tokenize()
+    @test kind(tokens[1]) == K"Identifier"


Suggested change

tokens = tokenize("+₁")

@test length(tokens) == 1 # Just the identifier, endmarker is not included in tokenize()

@test kind(tokens[1]) == K"Identifier"

@test tokensplit("+₁") == [K"Identifier"=>"+₁"]

c42f · 2025-08-08T08:33:02Z

test/tokenize.jl

 @testset "dotted and suffixed operators" begin

-for opkind in Tokenize._nondot_symbolic_operator_kinds()
+for opkind in _nondot_symbolic_operator_kinds()


This seems incorrect - this test now omits many many operators? But it used to depend on the fact that all the operator kinds were listed individually.

Instead, I guess we should have a big list of all the allowable operators here, separate from the list in Tokenize.

c42f · 2025-08-08T09:00:02Z

src/julia/parser.jl

+        bump_dotted(ps, leading_dot, leading_tok, emit_dot_node=!is_compound_assignment, remap_kind=
                      is_syntactic_operator(leading_kind) ? leading_kind : K"Identifier")
+
+        # Check if this is a compound assignment operator pattern


redundant comment

Suggested change

# Check if this is a compound assignment operator pattern

Keno · 2025-08-10T00:26:14Z

Thanks for the extensive review - I'll wait until the other PR is merged to rebase and fix those up.

Was Claude or other AI tool used for the code changes?

I used claude off and on for the big changeset that this was extracted out of (along with several of the earlier PRs). However, I think the things you flagged as most objectionable are not Claude's fault, but rather an artifact of multiple versions of iterations and rebases as the earlier patches in this sequence were cleaned up and put in individually.

Keno requested review from c42f and mlechu July 11, 2025 05:31

Keno mentioned this pull request Jul 11, 2025

Stop emitting K".." and K"..." in lexer #573

Open

Keno force-pushed the kf/rmopkinds branch from 6a687e4 to 734cc37 Compare July 11, 2025 05:50

Keno force-pushed the kf/rmopkinds branch from 734cc37 to 4f95341 Compare July 11, 2025 06:07

c42f reviewed Aug 8, 2025

View reviewed changes

Keno force-pushed the kf/dots branch from daf52ca to f868dc4 Compare August 10, 2025 00:19

Keno mentioned this pull request Aug 11, 2025

Revert to the use of StringMacroName/CmdMacroName kinds #583

Merged

		@test tok("1+=2", 2).kind == K"Operator" # + before =
		@test tok("1+=2", 3).kind == K"="

	@test tok("1+=2", 2).kind == K"Operator" # + before =
	@test tok("1+=2", 3).kind == K"="
	@test toks("1+=2")[2:3] == ["+"=>K"Operator", "="=>K"="]

	((v = v"1.12",), "a +₁= b") => "(op= a (error +₁) b)"
	"a +₁= b" => "(op= a (error +₁) b)"

Uh oh!

Remove separate syntax heads for each operator #575

Are you sure you want to change the base?

Remove separate syntax heads for each operator #575

Uh oh!

Conversation

Keno commented Jul 11, 2025

Uh oh!

codecov bot commented Jul 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

c42f left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Keno commented Aug 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Jul 11, 2025 •

edited

Loading